Tensor Multiplication on Parallel Computers

نویسنده

Bryan Rasmussen

چکیده

One disadvantage of our approach is that it requires each processor to work on a large piece of the problem for a long time, thus increasing the probability that a single processor failure will sabotage the computation. Another limitation is that we must increase the number of processors in potentially large step-increments in order to take advantage of larger clusters. (The ideal number of processors is an integer multiple or divisor of the number of rows in u.)

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New Parallel Matrix Multiplication Method Adapted on Fibonacci Hypercube Structure

The objective of this study was to develop a new optimal parallel algorithm for matrix multiplication which could run on a Fibonacci Hypercube structure. Most of the popular algorithms for parallel matrix multiplication can not run on Fibonacci Hypercube structure, therefore giving a method that can be run on all structures especially Fibonacci Hypercube structure is necessary for parallel matr...

متن کامل

Experimental Evaluation of BSP Programming Libraries

The model of bulk-synchronous parallel computation (BSP) helps to implement portable general purpose algorithms while keeping predictable performance on different parallel computers. Nevertheless, when programming in ‘BSP style’, the running time of the implementation of an algorithm can be very dependent on the underlying communications library. In this study, an overview of existing approache...

متن کامل

Some Combinatorial Aspects of Parallel Algorithm Design for Matrix Multiplication

متن کامل

Parallel Implementation of Multiple-Precision Arithmetic and 1, 649, 267, 440, 000 Decimal Digits of π Calculation

We present efficient parallel algorithms for multiple-precision arithmetic operations of more than several million decimal digits on distributed-memory parallel computers. A parallel implementation of floating-point real FFT-based multiplication is used because a key operation in fast multiple-precision arithmetic is multiplication. We also parallelized an operation of releasing propagated carr...

متن کامل

Generalized Hyper-Systolic Algorithm

We generalize the hyper-systolic algorithm proposed in [1] for abstract data structures on massive parallel computers with np processors. For a problem of size V the communication complexity of the hyper-systolic algorithm is proportional to √ npV , to be compared with npV for the systolic case. The implementation technique is explained in detail and the example of the parallel matrix-matrix mu...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2007

Tensor Multiplication on Parallel Computers

نویسنده

چکیده

منابع مشابه

A New Parallel Matrix Multiplication Method Adapted on Fibonacci Hypercube Structure

Experimental Evaluation of BSP Programming Libraries

Some Combinatorial Aspects of Parallel Algorithm Design for Matrix Multiplication

Parallel Implementation of Multiple-Precision Arithmetic and 1, 649, 267, 440, 000 Decimal Digits of π Calculation

Generalized Hyper-Systolic Algorithm

عنوان ژورنال:

اشتراک گذاری